Automated Speech Act Classification For Online Chat

نویسندگان

  • Cristian Moldovan
  • Vasile Rus
  • Arthur C. Graesser
چکیده

In this paper, we present our investigation on using supervised machine learning methods to automatically classify online chat posts into speech act categories, which are semantic categories indicating speakers’ intentions. Supervised machine learning methods presuppose the existence of annotated training data based on which machine learning algorithms can be used to learn the parameters of some model that was proposed to solve the task at hand. In our case, we used the annotated Linguistic Data Consortium chat corpus to tune our model which is based on the assumption that the first few tokens/words in each chat post are very predictive of the post’s speech act category. We present results for predicting the speech act category of chat posts that were obtained using two machine learning algorithms, Naı̈ve Bayes and Decision Trees, in conjunction with several variants of the basic model that include the first 2 to 6 words and their part-of-speech tags as features. The results support the validity of our initial assumption that the first words in an utterance can be used to predict its speech act category with very good accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Posting Act Tagging Using Transformation-Based Learning

In this article we present the application of transformation-based learning (TBL) [1] to the task of assigning tags to postings in online chat conversations. We define a list of posting tags that have proven useful in chat-conversation analysis. We describe the templates used for posting act tagging in the context of template selection. We extend traditional approaches used in part-of-speech ta...

متن کامل

Noisy Text Analytics

Text produced by processing signals intended for human use is often noisy for automated computer processing. Digital text produced in informal settings such as online chat, SMS, emails, tweets, message boards, newsgroups, blogs, wikis and web pages contain considerable noise. Also processing techniques like Automatic Speech Recognition, Optical Character Recognition and Machine Translation intr...

متن کامل

Using Speech Act Profiling for Deception Detection

The rising use of synchronous text-based computer-mediated communication (CMC) such as chat rooms and instant messaging in government agencies and the business world presents a potential risk to these organizations. There are no current methods for visualizing or analyzing these persistent conversations to detect deception. Speech act profiling is a method for analyzing and visualizing online c...

متن کامل

Automatic Discovery of Speech Act Categories in Educational Games

In this paper we address the important task of automated discovery of speech act categories in dialogue-based, multi-party educational games. Speech acts are important in dialogue-based educational systems because they help infer the student speaker’s intentions (the task of speech act classification) which in turn is crucial to providing adequate feedback and scaffolding. A key step in the spe...

متن کامل

Greeting Speech Act Forms in Iranian Junior High School Textbooks: Prospect Series vs Four Corners Series

The present study is an attempt to compare the use of different forms of greeting speech acts presented in Iranian junior high school textbooks, i.e. Prospect series (I, II, III) and Four Corners series (1, 2, 3 and 4) which are quite popular in Iranian high schools and institutions. To this end, greeting forms in the language function and conversation sections of these two series were analyzed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011